Nodejs cheerio Module Extract HTML page contents table of Contents
1. Nodejs cheerio module extracts HTML page content
1.1. Find the target element
1.2. Beautify Text output
1.3. Extract the answer text
1.4. Final Code
This article gives an example of using a Cheerio module to extract the specified content from an
Directory
Before you write it.
Example
Sample Requirements
Collection device
Join the agent
Request HTTPS
Write in the following ...
Before you write it.Many people have to do data acquisition needs, in different languages, different ways can be achieved, I used to write in C #, the main or send all kinds of requests and regular analytic data more cumbersome, overall there is nothing bad, is the efficiency is poor,Using Nodejs to write a colle
develop Webserver/websocket code, I find it can also meet some of my daily scripting needs. So I started looking for a ready-made library or tool for node. js in Web crawling, and, sure enough, I found cheerio. Cheerio is a node. JS library that builds the DOM structure from a piece of HTML, and then provides a CSS selector query like jquery. Very good! Because, in this world, CSS and CSS-driven style lis
Nodejs crawler superagent, cheerio, and nodejssuperagent
Preface
I have heard of Crawlers for a long time. I started to learn nodejs over the past few days and wrote an article title, user name, number of readings, number of recommendations, and user profile on the homepage of the crawler blog garden. Now I have a small summary.
These points are used:
1. node core module-File System
2. Third-party module used for http requests-superagent
3. The third-
Author: fbysss
QQ: Wine bar Bar I scattered
Blog:blog.csdn.net/fbysss
Statement: This article by fbysss Original, reprint please indicate the source
Preface
Crawling a Web page is a time-consuming and tedious task. Because the Web page format is different, it is difficult to rely entirely on machine automatic recognition.
In general, we can use the CSS selector to select the DOM node and extract what we need from the entire page.
The front end of the most familiar should be jquery. If jquery is
Objective:
Data acquisition
Write Local file backup
Building a Web server
Reading a file to a Web page for presentation
Directory structure:The contents of the Package.json file are the same as in the previous article: Nodejs+request+cheerio Data acquisitionRequest:https://github.com/request/request makes requests easier and easierCheerio:https://github.com/cheeriojs/cheerio is used to
GoalCreate a Lesson3 project in which to write code.When accessed in the browser http://localhost:3000/ , the output CNode (https://cnodejs.org/) Community home page of all post titles and links, in the form of JSONKnowledge Points:
Learn to crawl Web pages using superagent
Learn to use Cheerio Analysis Web pages
Library Introduction:Superagent (http://visionmedia.github.io/superagent/) is a library of HTTP aspects that can initiate
or refer to this article:Http://cnodejs.org/topic/54bdaac4514ea9146862abeeIn addition, the above article Nodejs grasping some experience of netease open class.The code is as follows, note that it uses HTTP to get Web page results, request for HTTP requests, Cheerio parsing, mkdirp Create directory, fs create file, Iconv-lite format conversion (This example is not required).Curl.js:/***/= require ("http"), function download (URL, callback) {= [];
1. Module use(1) HTTP request library in Superagent:nodejs (each language has countless, Java Okhttp,ios afnetworking)(2) HTML parsing Library in Cheerio:nodejs (basic for each language). )(3) Parallel/asynchronous concurrency function execution Library in Async:nodejs (this is very bull, other languages are not much the same type)2. Crawling contentMulti-play Hero League hero page, by parsing the URL of each hero within the page, and then request the hero's detailed data, extract the required d
Learn the html5 series ------ Online Offline (online status detection), html5 ------ Online
I. Opening Analysis
Hi, everybody. I want to thank you for your old age! I met you again, (* ^__ ^ ......, This series of articles mainly focus on Html5-related knowledge points, taking Learning API knowledge points as the entrance and introducing instances from a simple
The most natural learning rule is to use any vector that has the least loss in the past rounds. This is the same spirit as the consistent algorithm, which is commonly referred to as follow-the-leaderin online convex optimization, minimizing cumulative losses.For any t: We talked about the ability to minimize cumulative losses that cannot be explained by this algorithm in an online learning scena
Online presentation of special effects and Jquery online presentation based on jquery online production PPT
Jquery-based PPT online production Online Demo special effect code is a js-based high-end atmosphere on-level slide production special effects. As follows:
Download
Online Library Series 4-Online Playback End Edition
1. Reference The flexpaper player.
Introducing the original flexpaper Player file into your project is actually a swf. As a container, we have closed it, I used my icon, my homepage address, but it was not a substantial change. I would like to pay tribute to a master who has excellent technology) and encapsulated the as and style into this swf. I will publ
Oracle Online redo log files (online redo log fileAlmost all of the internal changes that occur in Oracle are recorded in the online redo log file, and Oracle uses these redo log groups to recover the database, so they are very important.The main work for online redo log files is:Back up your data,1: Record all data ch
Mall
A friend who has worked in the traditional industry for many years wants to go into the internet industry and ask me how to use its real resources to make a fast profit-making web site at a lower cost. Because I have always thought that E-commerce is the trend of the development of the body, so we talked about the profit model of E-commerce, here are listed in our conversation a few questions:(This article is about the network profit model, not literally)What is the biggest difference betwe
Label:Original: How SQL Server 2014 promotes non-online online operationsIn today's article, I would like to talk about the online index Rebuild operations, and how they have been promoted in SQL Server 2014. As we all know, the online index rebuild operation has been introduced since SQL Server 2005. However, these
Hi qq,msn Baidu, Ali Wangwang, Tradelink, MSN online customer service, line chat code
1. How to display Tencent QQ online chat on the Web page?QQ Online Consulting Code (QQ online chat code/qq online customer service code/QQ online
Label:In today's article, I would like to talk about the online index Rebuild operations, and how they have been promoted in SQL Server 2014. As we all know, the online index rebuild operation has been introduced since SQL Server 2005. However, these online operations are not really online, because SQL Server needs to
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.